Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms
نویسنده
چکیده
1 Context Given a finite set of m examples z 1 ,. .. , z m and a strictly convex differen-tiable loss function ℓ(z, θ) defined on a parameter vector θ ∈ R d , we are interested in minimizing the cost function min θ C(θ) = 1 m m i=1 ℓ(z i , θ). One way to perform such a minimization is to use a stochastic gradient algorithm. Starting from some initial value θ[1], iteration t consists in picking an example z[t] and applying the stochastic gradient update θ[t + 1] = θ[t] − η t ∂ℓ ∂θ ℓ(z[t], θ[t]) , where the sequence of positive scalars η t satisfies the well known Robbins-Monro conditions t η t = ∞ and t η 2 t < ∞. We consider three ways to pick the example z[t] at each iteration: • Random Examples are drawn uniformly from the training set at each iteration. • Cycle Examples are picked sequentially from the randomly shuffled training set, that is, z[km + t] = z σ(t) , where σ is a random permuta-• Shuffle Examples are still picked sequentially but the training set is shuffled before each pass, that is, z[km + t] = z σ k (t) , where the σ k are random permutations of {1, .
منابع مشابه
Multichannel recursive-least-square algorithms and fast-transversal-filter algorithms for active noise control and sound reproduction systems
In the last ten years, there has been much research on active noise control (ANC) systems and transaural sound reproduction (TSR) systems. In those fields, multichannel FIR adaptive filters are extensively used. For the learning of FIR adaptive filters, recursive-least-squares (RLS) algorithms are known to produce a faster convergence speed than stochastic gradient descent techniques, such as t...
متن کاملConjugate gradient neural network in prediction of clay behavior and parameters sensitivities
The use of artificial neural networks has increased in many areas of engineering. In particular, this method has been applied to many geotechnical engineering problems and demonstrated some degree of success. A review of the literature reveals that it has been used successfully in modeling soil behavior, site characterization, earth retaining structures, settlement of structures, slope stabilit...
متن کاملAsynchronous Distributed Semi-Stochastic Gradient Optimization
With the recent proliferation of large-scale learning problems, there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However, existing algorithms either suffer from slow convergence due to the inherent variance of stochastic gradients, or have a fast linear convergence rate but at t...
متن کاملAccelerating Stochastic Gradient Descent using Predictive Variance Reduction
Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove that this method enjoys the same fast conv...
متن کاملFast Stochastic Methods for Nonsmooth Nonconvex Optimization
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tack...
متن کامل